TCR Clustering Demo
Load Data
# file paths
cd8_filepath <- here::here("./data/CD8_Stims.rds")
# download
invisible(capture.output(Rdiscvr::DownloadOutputFile(
outputFileId = 785508,
outFile = cd8_filepath,
overwrite = FALSE,
)))
# read
seuratObj_CD8s <- readRDS(cd8_filepath)
# collapse the different gag stims into a single category
seuratObj_CD8s$Stim <- ifelse(
grepl("CM9|Gag", seuratObj_CD8s$Stim),
yes = "Gag",
no = seuratObj_CD8s$Stim
)Clustering
Omitted: Run TCR Clustering (Probably already done on prime-seq for your data)
Tuning Clustering Parameters
TCR clustering is not a one-size-fits-all procedure. Different datasets, TCR chains, organisms, experimental designs, and biological questions may require different clustering parameters.
The fastest way to determine the suitability of a TCR cluster is to
use the tooling in tcrClustR to visualize the clusters in
their native distance space.
These heatmaps and histograms are critical to our understanding of what the necessary parameterization is.
The VisualizeTcrDistances function iterates through each
of the different clusterings/distances calculated on the TCR data, and
shows the distance distributions within and between clusters.
There are two primary differences to consider as you begin to tune the clustering parameters.
- Which chain are you clustering on? (TRA, TRB, or both)
- Are you using full length distances or just CDR3?
tcrClustR enumerates the different combinations of these two factors
and then clusters them, but does so with a single
dianaHeight parameter value.
Notice in the heatmaps above that the color scale varies in
magnitude. This is because the distance distributions vary between
chains and distance types. A dianaHeight height of 50 may
be reasonable for TRB full-length distances, too large for TRA_CDR3, and
too small for TRA+TRB full length distances.
Depending on your assessment, you can re-run the clustering locally
with a different dianaHeight parameter to better suit your
data. This doesn’t take long. You may also specify chains, but it is
trivial to re-run all chains using a specific height.